Skip to content

Conversation

@featzhang
Copy link
Member

@featzhang featzhang commented Jan 5, 2026

Purpose of this change

This PR introduces an optional Triton-based inference module under flink-models, enabling Apache Flink to invoke NVIDIA Triton Inference Server for model inference.

The integration is implemented at the runtime layer via the existing model provider SPI, allowing users to define Triton-backed models using CREATE MODEL and execute inference through ML_PREDICT, without requiring any changes to the planner, runtime semantics, or SQL behavior.

This module is designed as a reusable and extensible foundation for integrating external inference services into Flink’s model inference framework.


Summary of changes

  • Introduced a new flink-model-triton module under flink-models
  • Implemented a Triton model provider based on the existing model inference SPI
  • Added support for asynchronous and batched inference via Triton HTTP/REST API
  • Provided configurable request batching and execution behavior aligned with Triton’s inference model
  • Added basic documentation describing usage and configuration

Verification

  • Verified module compilation and packaging
  • Added unit tests for the Triton model provider factory
  • Manually validated inference requests against a local Triton server

Impact assessment

This change is fully optional and isolated under flink-models.

  • API changes: No
  • Planner changes: No
  • Runtime changes: No
  • SQL semantics changes: No

Documentation

  • Added Triton-related documentation under docs/connectors/models/triton.md
  • Updated SQL model inference documentation to list Triton as a supported provider

Related issues

@flinkbot
Copy link
Collaborator

flinkbot commented Jan 5, 2026

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@github-actions github-actions bot added the community-reviewed PR has been reviewed by the community. label Jan 5, 2026
@featzhang featzhang changed the title [FLINK-38857][Model] Introduce a Triton inference module under flink-models for batch-oriented AI inference [FLINK-38857][Model] Introduce a Triton inference module under flink-models Jan 18, 2026
.withDescription(
Description.builder()
.text(
"Reserved for future use (v2+). Currently has NO effect in v1. "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious - what is the benefit of adding something we do not use yet? It would seem simpler to add it when we use it.

public static final ConfigOption<Long> TIMEOUT =
ConfigOptions.key("timeout")
.longType()
.defaultValue(30000L)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should use the Duration type for this

Description.builder()
.text(
"Full URL of the Triton Inference Server endpoint, e.g., %s",
code("http://localhost:8000/v2/models"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worrying that the example is http i.e. unsecure.

.booleanType()
.defaultValue(false)
.withDescription(
"Whether this is the start of a sequence for stateful models.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be useful to have a reference as to what a sequence is in this context

ConfigOptions.key("compression")
.stringType()
.noDefaultValue()
.withDescription("Compression algorithm to use (e.g., 'gzip').");
Copy link
Contributor

@davidradl davidradl Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we give a reference or list the valid values for this?


public static final ConfigOption<String> CUSTOM_HEADERS =
ConfigOptions.key("custom-headers")
.stringType()
Copy link
Contributor

@davidradl davidradl Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Headers are MAP <STRING, ARRAY>. Can we avoid using json and use the standard Flink map type passing lists of strings in the way that list types would expect?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-reviewed PR has been reviewed by the community.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants